Thuwal
G-OSR: A Comprehensive Benchmark for Graph Open-Set Recognition
Dong, Yicong, He, Rundong, Chen, Guangyao, Zhang, Wentao, Han, Zhongyi, Shi, Jieming, Yin, Yilong
--Graph Neural Networks (GNNs) have achieved significant success in machine learning, with wide applications in social networks, bioinformatics, knowledge graphs, and other fields. Most research assumes ideal closed-set environments. However, in real-world open-set environments, graph learning models face challenges in robustness and reliability due to unseen classes. This highlights the need for Graph Open-Set Recognition (GOSR) methods to address these issues and ensure effective GNN application in practical scenarios. Research in GOSR is in its early stages, with a lack of a comprehensive benchmark spanning diverse tasks and datasets to evaluate methods. Moreover, traditional methods, Graph Out-of-Distribution Detection (GOODD), GOSR, and Graph Anomaly Detection (GAD) have mostly evolved in isolation, with little exploration of their interconnections or potential applications to GOSR. T o fill these gaps, we introduce G-OSR, a comprehensive benchmark for evaluating GOSR methods at both the node and graph levels, using datasets from multiple domains to ensure fair and standardized comparisons of effectiveness and efficiency across traditional, GOODD, GOSR, and GAD methods. The results offer critical insights into the generalizability and limitations of current GOSR methods and provide valuable resources for advancing research in this field through systematic analysis of diverse approaches. RAPH learning, as a significant research direction in machine learning, has been widely applied in social network analysis, recommendation systems, bioinformatics, knowledge graphs, traffic planning, and the fields of chemistry and materials science [1]. Graph Neural Networks (GNNs) have demonstrated superior performance in various node classification and graph classification tasks [2]. These methods typically follow a closed-set setting, which assumes that all test classes are among the seen classes accessible during training [3]. However, in real-world scenarios, due to undersampling, out-of-distribution, or anomalous samples, it is highly likely to encounter samples belonging to novel unseen classes, which can significantly impact the safety and robustness of models [4], as illustrated in Figure 1. Guangyao Chen is with Cornell University, Ithaca, NY, USA. Wentao Zhang is with Peking University, Beijing, China. Zhongyi Han is with King Abdullah University of Science and Technology, Thuwal, Saudi Arabia. Rundong He and Yilong Yin are the corresponding authors. Closed-set classification cannot identify unseen classes, while open-set recognition can identify unseen classes and classify nodes belonging to seen classes.
Weight Vector Tuning and Asymptotic Analysis of Binary Linear Classifiers
Niyazi, Lama B., Kammoun, Abla, Dahrouj, Hayssam, Alouini, Mohamed-Slim, Al-Naffouri, Tareq
Unlike its intercept, a linear classifier's weight vector cannot be tuned by a simple grid search. Hence, this paper proposes weight vector tuning of a generic binary linear classifier through the parameterization of a decomposition of the discriminant by a scalar which controls the trade-off between conflicting informative and noisy terms. By varying this parameter, the original weight vector is modified in a meaningful way. Applying this method to a number of linear classifiers under a variety of data dimensionality and sample size settings reveals that the classification performance loss due to non-optimal native hyperparameters can be compensated for by weight vector tuning. This yields computational savings as the proposed tuning method reduces to tuning a scalar compared to tuning the native hyperparameter, which may involve repeated weight vector generation along with its burden of optimization, dimensionality reduction, etc., depending on the classifier. It is also found that weight vector tuning significantly improves the performance of Linear Discriminant Analysis (LDA) under high estimation noise. Proceeding from this second finding, an asymptotic study of the misclassification probability of the parameterized LDA classifier in the growth regime where the data dimensionality and sample size are comparable is conducted. Using random matrix theory, the misclassification probability is shown to converge to a quantity that is a function of the true statistics of the data. Additionally, an estimator of the misclassification probability is derived. Finally, computationally efficient tuning of the parameter using this estimator is demonstrated on real data. Alouni, and T. Y. Al-Naffouri are with the Electrical and Computer Engineering Program, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia; emails: {lama.niyazi,